For processing data with a grid-like or array topology:
1-D convolution: time-series data, sensor signal data
2-D convolution: image data
3-D convolution: video data
What hyperparameters do we have in a CNN model?
Main CNN idea for text:
Compute vectors for n-grams and group them afterwards
Example: “this takes too long” compute vectors for:
This takes, takes too, too long, this takes too, takes too long, this takes too long
MR: Movie reviews with one sentence per review. Classification involves detecting positive/negative reviews (Pang and Lee, 2005). url: https://www.cs.cornell.edu/people/pabo/movie-review-data/
SST-1: Stanford Sentiment Treebank—an extension of MR but with train/dev/test splits provided and fine-grained labels (very positive, positive, neutral, negative, very negative), re-labeled by Socher et al. (2013). url: https://nlp.stanford.edu/sentiment/
SST-2: Same as SST-1 but with neutral reviews removed and binary labels.
Subj: Subjectivity dataset where the task is to classify a sentence as being subjective or objective (Pang and Lee, 2004).
TREC: TREC question dataset—task involves classifying a question into 6 question types (whether the question is about person, location, numeric information, etc.) (Li and Roth, 2002). url: https://cogcomp.seas.upenn.edu/Data/QA/QC/
CR: Customer reviews of various products (cameras, MP3s etc.). Task is to predict positive/negative reviews (Hu and Liu, 2004). url: https://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
MPQA: Opinion polarity detection subtask of the MPQA dataset (Wiebe et al., 2005). url: https://mpqa.cs.pitt.edu/corpora/mpqa_corpus/
A transformer adopts an encoder-decoder architecture.
Transformers were developed to solve the problem of sequence transduction, or neural machine translation. That means any task that transforms an input sequence to an output sequence.
More details on the architecture and implementation:
There’s increasing evidence that pretrained models learn a wide variety of things about the statistical properties of language:
There’s increasing evidence that pretrained models learn a wide variety of things about the statistical properties of language:
There’s increasing evidence that pretrained models learn a wide variety of things about the statistical properties of language:
Write with Transformer: https://transformer.huggingface.co/
Talk to Transformer: https://app.inferkit.com/demo
Transformer model for language understanding: https://www.tensorflow.org/text/tutorials/transformer
Pretrained models: https://huggingface.co/transformers/pretrained_models.html
Convolutional Neural Networks
These models are still not well-understood